{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# PyTorch Gradients\n",
    "This section covers the PyTorch <a href='https://pytorch.org/docs/stable/autograd.html'><strong><tt>autograd</tt></strong></a> implementation of gradient descent. Tools include:\n",
    "* <a href='https://pytorch.org/docs/stable/autograd.html#torch.autograd.backward'><tt><strong>torch.autograd.backward()</strong></tt></a>\n",
    "* <a href='https://pytorch.org/docs/stable/autograd.html#torch.autograd.grad'><tt><strong>torch.autograd.grad()</strong></tt></a>\n",
    "\n",
    "Before continuing in this section, be sure to watch the theory lectures to understand the following concepts:\n",
    "* Error functions (step and sigmoid)\n",
    "* One-hot encoding\n",
    "* Maximum likelihood\n",
    "* Cross entropy (including multi-class cross entropy)\n",
    "* Back propagation (backprop)\n",
    "\n",
    "<div class=\"alert alert-info\"><h3>Additional Resources:</h3>\n",
    "<strong>\n",
    "<a href='https://pytorch.org/docs/stable/notes/autograd.html'>PyTorch Notes:</a></strong>&nbsp;&nbsp;<font color=black>Autograd mechanics</font></div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Autograd - Automatic Differentiation\n",
    "In previous sections we created tensors and performed a variety of operations on them, but we did nothing to store the sequence of operations, or to apply the derivative of a completed function.\n",
    "\n",
    "In this section we'll introduce the concept of the <em>dynamic computational graph</em> which is comprised of all the <em>Tensor</em> objects in the network, as well as the <em>Functions</em> used to create them. Note that only the input Tensors we create ourselves will not have associated Function objects.\n",
    "\n",
    "The PyTorch <a href='https://pytorch.org/docs/stable/autograd.html'><strong><tt>autograd</tt></strong></a> package provides automatic differentiation for all operations on Tensors. This is because operations become attributes of the tensors themselves. When a Tensor's <tt>.requires_grad</tt> attribute is set to True, it starts to track all operations on it. When an operation finishes you can call <tt>.backward()</tt> and have all the gradients computed automatically. The gradient for a tensor will be accumulated into its <tt>.grad</tt> attribute.\n",
    "    \n",
    "Let's see this in practice."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Back-propagation on one step\n",
    "We'll start by applying a single polynomial function $y = f(x)$ to tensor $x$. Then we'll backprop and print the gradient $\\frac {dy} {dx}$.\n",
    "\n",
    "$\\begin{split}Function:\\quad y &= 2x^4 + x^3 + 3x^2 + 5x + 1 \\\\\n",
    "Derivative:\\quad y' &= 8x^3 + 3x^2 + 6x + 5\\end{split}$\n",
    "\n",
    "#### Step 1. Perform standard imports"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-11-26T02:39:08.775321Z",
     "start_time": "2024-11-26T02:39:07.437085Z"
    }
   },
   "source": [
    "import torch"
   ],
   "outputs": [],
   "execution_count": 1
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Step 2. Create a tensor with <tt>requires_grad</tt> set to True\n",
    "This sets up computational tracking on the tensor."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-11-26T02:39:30.079649Z",
     "start_time": "2024-11-26T02:39:30.041137Z"
    }
   },
   "source": [
    "x = torch.tensor(2.0, requires_grad=True)\n",
    "x"
   ],
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor(2., requires_grad=True)"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "execution_count": 2
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Step 3. Define a function"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-11-26T02:39:32.979129Z",
     "start_time": "2024-11-26T02:39:32.972224Z"
    }
   },
   "source": [
    "y = 2*x**4 + x**3 + 3*x**2 + 5*x + 1\n",
    "\n",
    "print(y)"
   ],
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor(63., grad_fn=<AddBackward0>)\n"
     ]
    }
   ],
   "execution_count": 3
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since $y$ was created as a result of an operation, it has an associated gradient function accessible as <tt>y.grad_fn</tt><br>\n",
    "The calculation of $y$ is done as:<br>\n",
    "\n",
    "$\\quad y=2(2)^4+(2)^3+3(2)^2+5(2)+1 = 32+8+12+10+1 = 63$\n",
    "\n",
    "This is the value of $y$ when $x=2$.\n",
    "\n",
    "#### Step 4. Backprop"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "y.backward()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Step 5. Display the resulting gradient"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor(93.)\n"
     ]
    }
   ],
   "source": [
    "print(x.grad)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that <tt>x.grad</tt> is an attribute of tensor $x$, so we don't use parentheses. The computation is the result of<br>\n",
    "\n",
    "$\\quad y'=8(2)^3+3(2)^2+6(2)+5 = 64+12+12+5 = 93$\n",
    "\n",
    "This is the slope of the polynomial at the point $(2,63)$.\n",
    "\n",
    "## Back-propagation on multiple steps\n",
    "Now let's do something more complex, involving layers $y$ and $z$ between $x$ and our output layer $out$.\n",
    "#### 1. Create a tensor"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[1., 2., 3.],\n",
      "        [3., 2., 1.]], requires_grad=True)\n"
     ]
    }
   ],
   "source": [
    "x = torch.tensor([[1.,2,3],[3,2,1]], requires_grad=True)\n",
    "print(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2. Create the first layer with $y = 3x+2$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[ 5.,  8., 11.],\n",
      "        [11.,  8.,  5.]], grad_fn=<AddBackward0>)\n"
     ]
    }
   ],
   "source": [
    "y = 3*x + 2\n",
    "print(y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3. Create the second layer with $z = 2y^2$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[ 50., 128., 242.],\n",
      "        [242., 128.,  50.]], grad_fn=<MulBackward0>)\n"
     ]
    }
   ],
   "source": [
    "z = 2*y**2\n",
    "print(z)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 4. Set the output to be the matrix mean"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor(140., grad_fn=<MeanBackward0>)\n"
     ]
    }
   ],
   "source": [
    "out = z.mean()\n",
    "print(out)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 5. Now perform back-propagation to find the gradient of x w.r.t out\n",
    "(If you haven't seen it before, w.r.t. is an abbreviation of <em>with respect to</em>)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[10., 16., 22.],\n",
      "        [22., 16., 10.]])\n"
     ]
    }
   ],
   "source": [
    "out.backward()\n",
    "print(x.grad)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You should see a 2x3 matrix. If we call the final <tt>out</tt> tensor \"$o$\", we can calculate the partial derivative of $o$ with respect to $x_i$ as follows:<br>\n",
    "\n",
    "$o = \\frac {1} {6}\\sum_{i=1}^{6} z_i$<br>\n",
    "\n",
    "$z_i = 2(y_i)^2 = 2(3x_i+2)^2$<br>\n",
    "\n",
    "To solve the derivative of $z_i$ we use the <a href='https://en.wikipedia.org/wiki/Chain_rule'>chain rule</a>, where the derivative of $f(g(x)) = f'(g(x))g'(x)$<br>\n",
    "\n",
    "In this case<br>\n",
    "\n",
    "$\\begin{split} f(g(x)) &= 2(g(x))^2, \\quad &f'(g(x)) = 4g(x) \\\\\n",
    "g(x) &= 3x+2, &g'(x) = 3 \\\\\n",
    "\\frac {dz} {dx} &= 4g(x)\\times 3 &= 12(3x+2) \\end{split}$\n",
    "\n",
    "Therefore,<br>\n",
    "\n",
    "$\\frac{\\partial o}{\\partial x_i} = \\frac{1}{6}\\times 12(3x+2)$<br>\n",
    "\n",
    "$\\frac{\\partial o}{\\partial x_i}\\bigr\\rvert_{x_i=1} = 2(3(1)+2) = 10$\n",
    "\n",
    "$\\frac{\\partial o}{\\partial x_i}\\bigr\\rvert_{x_i=2} = 2(3(2)+2) = 16$\n",
    "\n",
    "$\\frac{\\partial o}{\\partial x_i}\\bigr\\rvert_{x_i=3} = 2(3(3)+2) = 22$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Turn off tracking\n",
    "There may be times when we don't want or need to track the computational history.\n",
    "\n",
    "You can reset a tensor's <tt>requires_grad</tt> attribute in-place using `.requires_grad_(True)` (or False) as needed.\n",
    "\n",
    "When performing evaluations, it's often helpful to wrap a set of operations in `with torch.no_grad():`\n",
    "\n",
    "A less-used method is to run `.detach()` on a tensor to prevent future computations from being tracked. This can be handy when cloning a tensor."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "<div class=\"alert alert-info\"><strong>A NOTE ABOUT TENSORS AND VARIABLES:</strong> Prior to PyTorch v0.4.0 (April 2018) Tensors (<tt>torch.Tensor</tt>) only held data, and tracking history was reserved for the Variable wrapper (<tt>torch.autograd.Variable</tt>). Since v0.4.0 tensors and variables have merged, and tracking functionality is now available through the <tt>requires_grad=True</tt> flag.</div>"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
